74 research outputs found

    Adaptive Cache Mode Selection for Queries over Raw Data

    Get PDF
    Caching the results of intermediate query results for future re-use is a common technique for improving the performance of analytics over raw data sources. An important design choice in this regard is whether to lazily cache only the offsets of satisfying tuples, or to eagerly cache the entire tuples. Lazily cached offsets have the benefit of smaller memory requirement and lower initial caching overhead, but they are much more expensive to reuse. In this paper, we explore this tradeoff and show that neither lazy nor the eager caching mode is optimal for all situations. Instead, the ideal caching mode depends on the workload, the dataset and the cache size. We further show that choosing the sub-optimal caching mode can result in a performance penalty of over 200%. We solve this problem using an adaptive online approach that uses information about query history, cache behavior and cache size to choose the optimal caching mode automatically. Experiments on TPC-H based workloads show that our approach enables execution time to differ by, at most, 16% from the optimal caching mode, and by just 4% on the average

    ReCache: Reactive Caching for Fast Analytics over Heterogeneous Data

    Get PDF
    As data continues to be generated at exponentially growing rates in heterogeneous formats, fast analytics to extract meaningful information is becoming increasingly important. Systems widely use in-memory caching as one of their primary techniques to speed up data analytics. However, caches in data analytics systems cannot rely on simple caching policies and a fixed data layout to achieve good performance. Different datasets and workloads require different layouts and policies to achieve optimal performance. This paper presents ReCache, a cache-based performance accelerator that is reactive to the cost and heterogeneity of diverse raw data formats. Using timing measurements of caching operations and selection operators in a query plan, ReCache accounts for the widely varying costs of reading, parsing, and caching data in nested and tabular formats. Combining these measurements with information about frequently accessed data fields in the workload, ReCache automatically decides whether a nested or relational column-oriented layout would lead to better query performance. Furthermore, ReCache keeps track of commonly utilized operators to make informed cache admission and eviction decisions. Experiments on synthetic and real-world datasets show that our caching techniques decrease caching overhead for individual queries by an average of 59%. Furthermore, over the entire workload, ReCache reduces execution time by 19-75% compared to existing techniques

    The Clarens Web Service Framework for Distributed Scientific Analysis in Grid Projects

    Get PDF
    Large scientific collaborations are moving towards service oriented architecutres for implementation and deployment of globally distributed systems. Clarens is a high performance, easy to deploy Web Service framework that supports the construction of such globally distributed systems. This paper discusses some of the core functionality of Clarens that the authors believe is important for building distributed systems based on Web Services that support scientific analysis

    N-[(Methyl­sulfan­yl)meth­yl]benzamide

    Get PDF
    In the title compound, C9H11NOS, the phenyl ring and formamide unit make a dihedral angle of 23.93 (14)°, whereas the (methyl­sulfan­yl)methyl group is oriented at a dihedral angle of 61.31 (8)° with respect to the phenyl ring. There are inter­molecular N—H⋯O hydrogen bonds, forming C(4) chains along the [010] direction. These polymeric chains are linked by C—H⋯O hydrogen bonds to form polymeric sheets in the (110) plane

    Potential of eucalyptus plantation in Malaysia

    Get PDF
    Eucalyptus is known as one of the fast growing species and recognised as a potential plantation species. Eucalyptus appear to be a genus with the greatest potential to provide supplemental fiber in many parts of the world. Likewise Eucalyptus species have been planted in several areas in Peninsular Malaysia, Sabah and Sarawak for several decades with mixed results. A study was undertaken to identify the drive, motivation, issues and challenges faced by the Eucalyptus planted in Malaysia. Motivational factors to grow and process Eucalyptus were being perceived as a potentially profitable species. Pulp and paper, sawn timber, laminated timber, woodchip, plywood and veneer were identified as the preferred final products. Planters and industry players had no issues or problems in the supply of Eucalyptus seedlings as almost all of them had their own orchard or nursery. All the industry players are of the opinion that the government should support the establishment of Eucalyptus plantation industry by providing financial assistance and by promoting more research and development activities to ensure the sustainability of the trees, as well as, the wood quality. The study revealed that Eucalyptus planters are not dependent on the local timber market, and that export market offers much greater possibilities. Eucalyptus is expected to progress as an important species in Malaysia with future opportunities for growth

    Distributed Analysis and Load Balancing System for Grid Enabled Analysis on Hand-held devices using Multi-Agents Systems

    Full text link
    Handheld devices, while growing rapidly, are inherently constrained and lack the capability of executing resource hungry applications. This paper presents the design and implementation of distributed analysis and load-balancing system for hand-held devices using multi-agents system. This system enables low resource mobile handheld devices to act as potential clients for Grid enabled applications and analysis environments. We propose a system, in which mobile agents will transport, schedule, execute and return results for heavy computational jobs submitted by handheld devices. Moreover, in this way, our system provides high throughput computing environment for hand-held devices.Comment: 4 pages, 3 figures. Proceedings of the 3rd International Conference on Grid and Cooperative Computing (GCC 2004

    Multidisciplinary Management and Outcome of Intradural Extramedullary Spinal Tumors

    Get PDF
    Introduction/Objective:  About fifteen percent of the primary CNS tumors are intraspinal. About two-thirds of tumors are intradural extramedullary (IDEM). This study was conducted to review the outcome of operative management of intradural extramedullary tumors in correlation with the factors, both clinical & histopathological, influencing the neurology of patients & prognosis. Materials and Methods:  It was a multicenter study including 42 patients conducted from December 2018 to December 2020. All patients were diagnosed by MRI with and without contrast. Patients were surgically treated & analyzed for clinical features i.e., pain by visual analog scale (VAS) & neurology by modified McCormick scale both preoperatively & post-operatively. Clinical features & outcomes were correlated with tumor size & histopathology. p-value < 0.05 was considered significant. Results:  This study included 42 cases. The most common diagnosis was schwannoma (76.19%). The average intradural space occupied at presentation was 82%. The most common location was dorsal (90.4%). The visual analog score for pain (VAS) improved in all patients post-operatively from 7 ± 1.9 to 2 ± 0.8 (p = 0.003) & modified McCormick scale from 3.0 ± 1.3 to 2.0 ± 1.0 (p = 0.005). The preoperative symptoms were correlated with the only size of the tumor occupying the intradural space (VAS p = 0.021, modified McCormick scale p = 0.018). Conclusion:  All the tumors excised showed some improvement in neurological status. Therefore, all patients diagnosed with IDEM should be operated on even if present with prolonged symptoms or severe neurological compromise. Keywords:  Intradural Extramedullary, Meningioma, Schwannoma, Intraspinal

    Heterogeneous Relational Databases for a Grid-enabled Analysis Environment

    Get PDF
    Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a web service interface, and exploits the features of a Data Warehouse and Data Marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens [4], Unity [7] and POOL [8]. This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for Grid users, especially the scientific community wishing to collate and analyze data distributed over the Grid
    corecore